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Abstract 

The time dependent non-equilibrium radiation diffusion equations are im- 
portant for solving the transport of energy through radiation in optically 
thick regimes and find applications in several fields including astrophysics 
and inertial confinement fusion. The associated initial boundary value prob- 
lems that are encountered often exhibit a wide range of scales in space and 
time and are extremely challenging to solve. To efficiently and accurately 
simulate these systems we describe our research on combining techniques 
that will also find use more broadly for long term time integration of non- 
linear multiphysics systems: implicit time integration for efficient long term 
time integration of stiff multiphysics systems, local control theory based 
step size control to minimize the required global number of time steps while 
controlling accuracy, dynamic 3D adaptive mesh refinement (AMR) to mini- 
mize memory and computational costs, Jacobian Free Newton-Krylov meth- 
ods on AMR grids for efficient nonlinear solution, and optimal multilevel 
preconditioner components that provide level independent solver conver- 
gence. 

Keywords: Adaptive mesh refinement, Jacobian Free Newton-Krylov, 
implicit methods, non-equilibrium radiation diffusion, multilevel solvers, 
timestep control 
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1. Introduction 

In the fields of astropliysics and inertial confinement fusion tlie time 
dependent non-equilibrium radiation diffusion equations are important for 
solving the transport of energy through radiation in an optically thick 
regime. In this paper we employ a form of the model that has a fiux-limited 
diffusion approximation (gray approximation) for the energy density cou- 
pled to a material temperature equation that incorporates a nonlinear mate- 
rial conduction term [H 121 El IH E] • This nonlinear, coupled, time dependent 
set of partial differential equations (PDEs) exhibits multiple temporal and 
spatial scales, and the associated initial boundary value problems are highly 
stiff and challenging to solve. As a result, they are also an excellent testbed 
for the development of simulation methods for long term time integration 
of stiff multi-physics systems. 

In this paper we will limit our scope to fully implicit time integration 
methods. This then enables the use of timestep control methods based 
on accuracy considerations and enables us to leverage the theoretical ad- 
vances for accuracy based timestep control that exist in the field of ordi- 
nary differential equations (ODEs). We experiment with different adaptive 
timestep control methods including control theory based approaches that 
attempt to monitor and control the temporal accuracy at each timestep 
and minimize the total number of timesteps required over the course of the 
simulation. The use of control theoretic approaches to timestep control is 
new for radiation-diffusion calculations and is only beginning to be used for 
multi-physics calculations. Variable step fully implicit time integration is 
combined with 3D dynamic structured adaptive mesh refinement (AMR) [B] 
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with an objective towards minimizing the total number of degrees of free- 
dom required over the course of a simulation. Care is however required in 
combining these techniques as spatial regridding during dynamic AMR can 
introduce non-stiff transient errors that significantly affect the behavior of 
timestep control algorithms and can lead to a dramatic increase in the total 
number of timesteps required over uniform spatial grid calculations when 
not properly controlled. We will report on our experiences with different 
timestep controllers and the modifications required in this context for AMR 
as the literature on this topic, particularly for multi-physics simulations, is 
sparse. Fully implicit time integration methods require highly efficient so- 
lution of the nonlinear systems at each timestep in order to be competitive 
with other methods. Here, we choose to use Jacobian Free Newton-Krylov 
(JFNK) methods with physics based preconditioning. JFNK methods allow 
us to avoid the formation of the full Jacobian matrices across AMR grid hier- 
archies which can be problematic and programming intensive for flux based 
finite volume discretizations on 3D AMR grid hierarchies which incorporate 
coarse-fine interpolation across grid levels. JFNK methods often obtain 
their efficiency from careful design of preconditioners. Efficient precondi- 
tioners on uniform grids for JFNK methods often employ multigrid solvers 
to tackle elliptic components to deliver grid independent performance. In 
the context of AMR, particularly for problems with elliptic components, 
preconditioner performance can degrade as the number of refinement levels 
in the AMR hierarchy increases if proper care is not paid to coupling be- 
tween levels. By employing suitable multilevel preconditioner components 
we will demonstrate level independent performance of our nonlinear solvers 
for non-equilibrium radiation diffusion applications. 

The remainder of this paper is organized as follows. Section 2 of this pa- 
per surveys related work in the context of equilibrium and non-equilibrium 
radiation diffusion problems. Section 3 describes the model problem and its 
temporal and spatial discretization. Section 4 describes the JFNK method 
and the multilevel preconditioners employed. Section 5 presents numeri- 
cal results and Section 6 presents conclusions and directions for future and 
ongoing work. 

2. Related work 

In [7], Rider, Knoll and Olson introduced the idea of physics based pre- 
conditioning in ID for non-equilibrium radiation diffusion problems. Fur- 
ther work by Mousseau, Knoll, Rider |1] and Mousseau, Knoll [5] extended 
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this methodology to problems in 2D on uniform grids. Their work related 
to physics-based preconditioning will be leveraged here with major exten- 
sions for 3D AMR grids and multilevel preconditioners. In Mavriplis 
compared two different approaches to solving the nonlinear systems at each 
timestep by considering Newton- Multigrid and Full Approximation Scheme 
(FAS) using agglomeration ideas on unstructured grids for this problem. In 
[2] Olson considers the use of efficient operator split time integration schemes 
on uniform grids. Work by Lowrie et. al.|8j compares different time inte- 
gration methods for non-equilibrium radiation diffusion while Brown, Shu- 
maker, Woodward [9] focus on fully implicit methods and high order time in- 
tegration on uniform grids. The motivation to consider automatic timestep 
control in our work was partially derived from [9]. We build on their work 
to further consider the use of the control theory based timestep controllers 
that provide computational stability as described in jTU] and related ref- 
erences and consider modifications that are required for AMR. Glowinski, 
Toivanen [H] consider using automatic differentiation and system multigrid. 
Shestakov, Greenough, and Howell [12] consider pseudo-transient continu- 
ation on AMR grids using an alternative formulation. Also worth men- 
tioning is related work for equilibrium radiation diffusion problems. Stals 
[T3] compares the performance of Newton-Multigrid and FAS with local 
refinement on unstructured grids and Pernice, Philip [14j use JFNK with 
a Fast Adaptive Composite Grid (FAG) preconditioner on AMR grids for 
single physics equilibrium radiation-diffusion on structured adaptive mesh 
refinement (SAMR) grids. 

3. Problem formulation and discretization 

3.1. Model problem 

The non-dimensional model equations considered in this paper are given 
by P El El SI i: 

dE 

— -V-{DeVE) = a,{T'-E) in (1) 

— -V-PtVT) = -a,{T^-E) infi, (2) 

where E is the radiation energy density, T the material temperature, V 
the gradient, V- the divergence operator, and De and are nonlinear 
diffusion coefficients given by 
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De = —, 
3cra 

Dt = kT'^, 

where aa is the photon absorption cross section. cTq is modeled by a consti- 
tutive law of the form 

aa = z'T-' (3) 

with z being the material atomic number and k = 0.01 in our experiments. 
D^; is usually flux limited to prevent non-physical effects and we use the 
Wilson limiterlTSl: 



(3a. + 

For the purposes of this paper this model will suffice, though more sophis- 
ticated models suited for production level codes are considered in jTB]. 

An initial boundary value problem (IB VP) for ([l])-([2]) is posed on the 
unit cube domain, Q = [0, 1]^, with initial conditions 

E = Eo, T=(Eo)3 att = 0, (4) 
and boundary conditions 

1 E 

-n ■ DrVE +j = R on dQn, t > 0, (5) 

n ■ DrVE = on dQj^, t > 0, (6) 

n . VT = ondn, t>0, (7) 

with dfl = dVL-ji U dVLj^, dfl-ji fl dflj^ = 0. The Robin boundary conditions 
for the energy density are defined on dflji, which consists of the left and 
right faces at x = and x = 1, while Neumann boundary conditions are 
imposed on all other faces. Following [7] , we do not flux limit the boundary 
conditions in ([s]). 
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3.2. Spatial discretization 

Let VL = [xi, Xh] X [yi, yh] x [zi, z^] be a rectangular computational domain. 
We discretize the domain in the x direction by subdividing [x/,a;/(] into Ux 
subintervals with centers Xi = xi + {i + ^)hx with hx = {xh — xi)/nx for 
i = 0, . . . ,nx — I- Each subinterval has faces located at x. i = Xi — hx/2 

and X. 1 = Xi + hx/2. The y and z directions are similarly discretized by 

subdividing [yi,yh] and [zi, Zh] into Uy and subintervals with grid spacings 
hy = {Vh — yi)/^y and hz = {zh — zi)/nz respectively. The tensor product 
of these subintervals partitions Vl into a collection of computational cells 
VL^ = {fiij,k} each of size hx x hy x hz centered at coordinates {xi,yj,Zk). 
These ideas are readily extended to the case where f2 is a union of non- 
overlapping rectangular regions, and we continue to use the same notation 
fl^ to denote such a collection of computational cells. Such regular grids 
are in widespread use in computational science and engineering, and high 
quality software that is tuned to regular grids, such as software for geometric 
multigrid methods, is available. 

We now describe how the above approach can be extended to construct 
hierarchies of structured regular grids to form adaptive mesh refinement 
(AMR) hierarchies. Let L > 1 and Qi = Q D ^ ■ • - ^l be a nested 
set of subdomains of the computational domain Q. For simplicity, assume 
that each Qe, 2 < £ < L is a union of non-overlapping logically rectangular 
regions; these are the subregions of Q where additional resolution is desired. 
A composite structured AMR (SAMR) grid on f2 is a nested hierarchy 
of grids Qi'^ D fig^ D ■ • ■ D Q'I^ consisting of L levels, with mesh spacing 
hi > /i2 > ■ ■ ■ > hi, with the coarsest grid il^^ covering il. Each level 
Q^' consists of a union of non-overlapping regions, or patches, at the same 
resolution h£. When there is no risk of confusion we will drop the i subscript 
and simply refer to fl^. This hierarchical representation allows operations 
on Q'^ to be implemented as operations on individual levels which in 
turn are decomposed into operations on individual patches. This property 
facilitates reuse of software written for regular grids. Figure [T] shows a 
SAMR grid with L = 3 and one patch on each of the refinement levels. 
Note that while each level is nested in the next coarser level, there is no 
requirement that a patch at one refinement level is nested fully in a patch 
at another refinement level, i.e., a fine patch at refinement level / may lie 
over one or more coarser patches at refinement level (/ — 1). 

Having described the decomposition of a structured AMR hierarchy we 

now describe the discretization of ([l|-(|2]) on the SAMR grid hierarchy. We 
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Figure 1: Example of a multilevel SAMR grid with three levels. 



begin by describing the spatial discretization on a single regular grid fol- 
lowed by the necessary modifications for SAMR. A method of lines (MOL) 
approach is used where we first discretize in space to obtain a set of coupled 
ODEs for the variables at each spatial location, followed by discretization 
in time. Only the spatial discretization for ([T]) will be described in detail 
noting that a similar procedure is used to discretize ([2]). Integrating ([T]) over 
a cell volume fiij,k and using the Gauss theorem for the diffusion terms we 



obtain: 



dE 
~dt 



<ya{T' - E) 



dV 



- j (DBV^)-ndA = 



Let -Eij,fc and Tjj^fc denote discrete variables collocated at cell centers in- 
dexed by (z, j, k) approximating E and T. The volumetric integral terms in 
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8|) are approximated by 
dE 



dt 



dV 



dE, 



dt 



The surface integral diffusive flux term in (jsj) is evaluated approximately as 



with 



da 



'''X 



da 



da^ j If 



riy 

{DEEy\^^.^^,dA ^ pi.).,+|,.^^^^^±4^^/../.a2) 



,j,k 



The face centered diffusion coefficients, -D_e in ([9])- (14) are computed 
following the description in [16]. First, a face centered T value based on 
arithmetic averaging of adjacent cell values is computed 



Alternatives for computing the face centered temperature described in [16] 
include geometric or parametrized arithmetic-geometric averages. Flux 
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matching for energy conservation with photon absorption cross sections eval- 
uated at the face centered temperature leads to the following expression for 
a harmonically averaged left face centered diffusion coefficient which is at 
first not flux limited: 



i-^r)i-^j^k ~ n ( 3 _|_ 3 A 



A flux limited -D^ is then obtained from Dr as: 



Similar expressions apply for De d.t other faces. We note that several al- 
ternate problem dependent choices for discretizing the diffusion coefficients 
are detailed in [16]. 

At physical boundaries the values of E and T are extrapolated to ghost 
cells using a first order scheme and the ghost values are used in evaluating 
the flux terms required at the physical boundary faces. 

3.2.1. Modifications for Structured Adaptive Mesh Refinement 

Coarse-fine stencils: The standard finite volume discretization of ([T]) 
and (|2| leads to regular stencil patterns for cells that lie in the interior of a 
patch with each cell being connected to its six immediate neighbors normal 
to the faces of the cell. This promotes regular array access, minimizes cache 
misses and allows for the reuse of software written for regular uniform grids. 
However, 3D structured AMR with variable size patches on all levels can 
lead to complex geometric interactions between patches. Cells on the surface 
of a patch can lie on either the faces, edges, or corners of the patch and can 
be adjacent to surface cells from patches on the same and/or the adjacent 
coarser refinement level leading to irregular stencil patterns. One commonly 
used approach to restore regular array access and implicitly account for the 
irregular stencils is to interpolate coarse cell data into ghost cells aligned 
with the fine surface cells. This does require accounting for all the possible 
coarse-fine and fine-fine patch interactions and is programming intensive. 



In Figure |A.11| of the appendix, we classify the different types of ghost 
cells for a representative example patch, referring to them collectively as 
coarse fine boundary fragments (CFBFs). As mentioned earlier, in 3D, very 
complex configurations of patches can show up and a patch can possibly 
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■ Ci{i = 1,2,3,4) coarse ghost cell value coarse ghost cells 

^ i interpolated value fine ghost cells 

• / fine cell value fine cell 

tt 9 fine ghost cell value 



Figure 2: Coarse fine boundary interpolation. 



have all the types of CFBFs listed. A representative example of the types 



of configurations encountered is shown in Figure |A.12 



For the purpose of this application linear interpolation at coarse-fine 
boundaries was sufficient to obtain second order accuracy as will be seen 
from our numerical results. A further reason to use linear interpolation is 
that higher order interpolation methods in this case also suffer from the 
potential to overshoot and produce non-physical negative energy and tem- 
perature values. We briefly describe linear interpolation of aligned fine ghost 
cells on patch faces using both coarse and fine cell data, noting only that 
linear interpolation for more complex configurations required modifications 
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to this basic procedure that needed to be handled on a case by case basis. 

Figure [2] illustrates linear interpolation into a fine ghost cell using both 
coarse and fine values. Standard bilinear interpolation of the four coarse 
ghost cell values Ci{i = 1,2,3,4) in the upper right figure (whose 2D pro- 
jection is in the lower left figure) is used to obtain a coarse value i aligned 
with the fine interior cell /. This value, while aligned is not at the fine 
ghost cell center. Linear interpolation normal to the face between values i 
and / is then used to obtain the aligned and centered fine ghost cell value 
g (lower right figure). These aligned and centered fine ghost cell values 
are now indistinguishable from interior fine cell values for the purposes of 
stencil operations. 




Figure 3: Flux. 



Flux calculations: Nonlinear and linear residual calculations require the 
calculation of the fiuxes ([9])-( 14 ). For fiux conservation at coarse-fine bound- 



aries, fine fiuxes are averaged and are used in place of the fiux calculated 
using coarse values only as shown in Figure ([s]). 

3.3. Time discretization 

Following the spatial discretization of the previous section we obtain a 
system of coupled ODEs over the AMR grid that we denote by 

u = f(u). (15) 

Here u and u respectively denote the vector of discrete unknowns and time 
derivatives for E and T across all cells. The previously cited papers, par- 
ticularly O m [9] discuss the potential merits of various time integration 
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schemes. In this paper we choose imphcit time integration schemes due to 
the extreme stiffness of the systems considered as they allow us to step over 
the smallest time scales in the problem and instead advance at the dynami- 
cal time scale of interest. Furthermore, this enables us to leverage timestep 
control algorithms based on controlling accuracy from the ODE literature. 
We use a variable timestep backward differentiation formula of fixed order 
2 (BDF2) for the numerical experiments presented. Alternate implicit time 
discretizations such as variable order BDF or implicit Runge-Kutta meth- 
ods are possible [T7j, but this is left as a topic for future investigation. Let 
/S.tn denote the n-th timestep and let a„ = At„/At„_i, then a variable step 



BDF2 discretization of (15) is given by 



— u„+i - (1 + a„)u„ + — u„_i = At„f (un+i) (16) 

1 + ttn / \l + OLn) 

where vl^ denotes the computed solution at the n-th timestep. Note that for 
the very first timestep the solution at two previous timesteps is not available 
and a BDFl method (backward Euler) 

U„+i = U„ + At„f(Un+i) (17) 

is used instead. 

3.J^. Timestep control 

In order to balance the competing objectives of accuracy and minimizing 
the number of required timesteps a method of timestep control is required. 
Within our implementation we experimented with three different algorith- 
mic approaches to timestep control. The first approach to timestep control 
is based on limiting the percentage change in energy and temperature as 
described in [2l [TU [18]. While simple to implement, this can be viewed as 
a truncation error strategy based on first order time integrators [12] po- 
tentially limiting the timestep unnecessarily. We found this to be true in 
practice with difficulties in choosing the necessary heuristic parameters to 
provide an appropriate balance between accuracy and efficiency. Choosing 
too small a percentage change led to small timesteps and increased solution 
costs while choosing too large a change led to instabilities with negative 
overshoots in the energy values. 

The second strategy considered was the traditional ODE timestep con- 
troller [ini |20l [21] based on controlling the local error per step (EPS): 



Atn- 



\en\ 



'At„, (1^ 



with k = 3 for a second order method such as BDF2. et is a target user 
selected error tolerance, e„ is an estimate of the local timestep error, and 
II • II is a norm to be specified. We note that Brown et. al. p] use this 
strategy in the context of non-equilibrium radiation diffusion problems on 
uniform grids. In a series of papers [22], [23], IMl HDl EH] |26], Gustafsson, 
Soderlind and collaborators developed and analyzed timestep control al- 
gorithms including the standard ODE timestep controller described above 
using systematic control theory approaches. They proposed new timestep 
controllers that address issues of efficiency, computational stability, and 
tolerance proportionality that can arise with the traditional ODE timestep 
controller as implemented in ODE time integration packages. We consider 
the class of proportional-integral controllers (PI controllers) [10] given by: 





ki 




e„-i 


kp 


VI 


1 \ 






1 1 





an+i = — rj -n — rr «n, (19) 



and in particular, we choose the PCA.7 controller of [10] with kkj = 0.4 
and kkp = 0.7 with k = 3 for the BDF2 method. 

While we are able to control the accuracy extremely well with both of the 
latter approaches on uniform grids we found that modifications are required 
in the context of dynamic AMR. Dynamic regridding requires the transfer of 
solution data from an existing to a new AMR hierarchy. This introduces in- 
terpolation errors that act as non-stiff transient error components affecting 
the computation of the local time error estimates, e„ that are based on the 
previous time history. The timestep controllers typically respond by dra- 
matically reducing the timestep requiring a significant number of timesteps 
before the timestep is back to the dynamical timescale of the problem lead- 
ing overall to an inefficient simulation. Similar effects have been noted 
before by Petzold[27] in a report on ID moving grid (r— refinement) meth- 
ods for PDEs, Trompert and Verwer [28] for 2D parabolic PDEs system 
simulations on structured AMR grids, and Hyman et. al [29] for hybrid 
moving mesh- static regridding approaches. The approach advocated in 
[2Z] is to use an implicit filtered truncation error estimator while Trompert 
and Verwer use a first order truncation error estimator to avoid non-stiff 
transients affecting the error estimates. However, as pointed out previously 
this can unnecessarily restrict the timestep. 

3.5. Local time error estimation 

The timestep controllers described in (18)-(19) require an estimate of 
the local time error, e„. At the beginning of each timestep a generalized 
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leapfrog method [30] given by 



u 



n+l 



1 + an) At„u^ 



u 



n-1 



(20) 



is used to provide an initial guess for the solution at the current step as 
well as in computing an estimate of the local error for the step. Here u„ is 
the vector of discrete time derivative unknowns at time step n. Following 
[SO] , componentwise Taylor series expansions for the exact solution u(t„_,_i), 
the predictor u^j^-^, and the BDF2 solution u„,_|_i can be used to derive the 
following expressions for the local errors for the generalized leapfrog and 
BDF2 methods given by 



BDF2: 



u 



n+l 



u(t„+i) 
Generalized leapfrog: 



(At„ + Atn-l] 



At„, (2At„ + At„_i) 6 



f^l,{3). 



U 



n+l 



u(t 



n+l J 



- 1 + 



At, 



n-l 



At, 



^ (3) 

6 " ■ 



(21) 



(22) 



From these expressions an approximation for the local error can be derived 
given by 



+ 1 

3a„. + 2 



(u 



n+l 



U 



n+l 



) 



In (18) and (19) the norm of the error is a scaled max norm 



where 



max 



^n,i,j,k 



{En,i,j,k + Ve) 



(23) 



(24) 



(25) 



with en,i,j,k ^'^d En,i,j,k the the local time error estimate (23) and the solu- 



tion value of the energy density respectively for grid cell (Tfj, fc), and rjE a 
constant scaling. A similar expression is used to compute ||e^||. The choice 
of the scaled max norm instead of an L2 norm is important and dictated 
by the fact that the errors in our AMR simulations are typically local in 
nature when care is taken with spatial discretization. 
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Remark 1: There appears to be an error in (3.16-250) of [30] leading to a 
different expression there for the local error estimate. 



Remark 2: (20) requires an estimate for the time derivative which is com- 



puted using the left hand side of (16). As noted in [30] using the function 



evaluation directly as the expression for the time derivative term in the pre- 
dictor does introduce round off errors that can lead to predictions for the 
timestep that are an order of magnitude lower. Furthermore, evaluation of 
the nonlinear function can be potentially very costly. A further reason not 
to use the function evaluation directly is that after regridding solution data 
interpolated from the old grid hierarchy is not a solution on the new grid 
hierarchy and any function evaluation (particularly nonlinear) will suffer 
significant errors. 

4. Nonlinear solution method 

Efficient simulation of time dependent nonlinear multi-physics systems 
using implicit time integration requires attempting to minimize the total 
number of required timesteps over the simulation time interval through 
timestep control, as well as efficient solution of the nonlinear systems that 
arise at each timestep. Following [T3] an inexact Newton approach, in par- 
ticular, a preconditioned Jacobian Free Newton-Krylov (JFNK) method is 
used to solve the nonlinear systems at each timestep. JFNK methods have 
demonstrated their effectiveness in many applications [31] and in what fol- 
lows we briefly summarize the general approach and then detail how our 
approach exploits the multilevel nature of the AMR grid hierarchies for 
efficiency. 

4.I. Jacobian Free Newton-Krylov Methods 

Let F : — )• denote the nonlinear system at each timestep and 
consider calculating the solution u* G of the system of nonlinear equa- 
tions 

F(u*) ^ (^-^-^] u^-(l + a„)u„+f-^') u„_i-Aa(u'^) = (26) 



that arises from (16). Classical Newton's method for solving (26) generates 
a sequence of approximations u*^ to u*, where u'^"''^ = u'^ + v*^ and the 
Newton step v'^ is the solution to the system of linear equations 



F'(u'=)v^ = -F(u^), 
15 



(27) 



where F' is the Jacobian of F evaluated at u*^. Newton's method is attractive 
because of its fast local convergence properties, but for large-scale problems, 



it is impractical to solve (27) with a direct method. Furthermore, it is 



often unnecessary to solve (27) using a tight convergence tolerance when 
u'^ is far from u*, since the linearization that leads to (27) may be a poor 



approximation to F(u). Generally, it is much more efficient to employ so- 



called inexact Newton methods [32], in which the linear tolerance for (27) 
is selected adaptively by requiring that v'^ only satisfy: 

||F(u'=) + F'(u'^)v'=|| < %||F(u'=)|| (28) 

for some r/^ G (0, 1) ^32j. Appropriate choice of the forcing term rjk can lead 
to superlinear and even quadratic convergence of the iteration [33] ■ The 
algorithm we use can be found in |34j . 

Remark: A modification introduced into the general inexact Newton al- 
gorithm above due to the application was to constrain the search direction 
vectors so that the update vectors u*^ -|- v*^ maintain the positivity of E 
and T. 

While any iterative method can be used to find a v*^ that satisfies (28), 



Krylov subspace methods are distinguished by the fact that they require 
only matrix-vector products to proceed. These matrix-vector products can 
be approximated by a finite-difference version of the directional (Gateaux) 
derivative as: 

FV-)v^'^'"^ + ">-'^("^'. (29) 



which is especially advantageous when F' is difficult to compute or expensive 
to store (both being the case here due to the presence of multiple grid 
patches across the AMR grid hierarchy). 

Several approaches exist to compute the differencing parameter e j34j . 
In general, the main consideration in the choice of e is the accuracy of 



the approximation (29). A further consideration that cannot be relaxed in 



this application is maintaining the positivity of the component E and T 



values for the vector u'^ + e\ in (29). The conditions when the positivity 



constraint could be violated seem to occur most frequently immediately 
after regridding when interpolation and coarsening of solution data between 
old and new AMR grid hierarchies introduces transient numerical errors. 
The most robust choice (while more expensive than some other alternatives) 
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in our numerical experiments was to compute 



e 



^1^^ if(u,v) >6u™„||v||i 

V^u^. .^gn((u, v)^ Otherwise, 



where emach is machine precision, u^in is set to 10~^, || ■ || and || ■ ||i are 
the discrete L2- and 1-norm respectively. In our numerical experiments 
performed with double precision arithmetic, e is typically on the order of 
10-^ 

Among the Krylov methods appropriate for non-symmetric definite sys- 
tems we choose the GMRES method for its robustness |35J. A further 
advantage of GMRES when employed as part of a JFNK method is that 
the Krylov vectors are normalized, ||v|| = 1, bounding the error introduced 



in the difference approximation of ( 29 ) whose leading error term is propor- 



tional to e||v|p [|36j. The main drawbacks commonly cited with GMRES 
are that it can potentially require the storage of a large number of Krylov 
vectors making it memory intensive and that it can be compute intensive as 
its computational complexity is proportional to the square of the number of 
GMRES iterations at each Newton step. We limit the potential impact on 
efficiency of these characteristics of GMRES by minimizing the number of 
GMRES iterations required at each Newton step through a combination of 
inexact Newton methods (to prevent oversolving) and strong physics based 
preconditioning to improve the conditioning of the systems. The next sec- 
tion focuses on the details of the preconditioner and its application. 

4-2. Precondition ers 

JFNK allows us to focus on developing effective preconditioners. Right- 
preconditioning of the Newton equations is used, i.e., we solve 

{F'{u'')p-^)P^f'' = -F(u^). 



where P is the preconditioner. From (29), this requires the Jacobian- vector 
products 

F(u^)(p-.w) « F(u^+^f--w)-F(u^) 

within GMRES. These are computed in two steps. An application of the 
preconditioner, y = P~^w, yields the vector y that is then used to compute 
F(u +£y)-F(u ) ^ Having described the general approach we now describe 
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how the preconditioner P is constructed. For preconditioning the Jacobian 
systems at each Newton step are approximated by 

^k^f (1 + (tMI - PrN ■ D%V -aMT'fl 

V -""-^nl (l + a,/3„(T'=f)/-/3„V-Z^^V 

where D'^ and D!^ are the diffusion coefficients frozen at the previous New- 
ton iterate values and (3n = ^ ^a" "+1 ) Several options for precondi- 
tioning are described in the literature. We choose to extend the following 
multiplicative splitting described in [1] to AMR grids. 



where 



and 



f3nV ■ D|V 

/ - /3nV ■ D^V 



Systems involving V2 only require local cell by cell inversion of block 2x2 
systems which is an embarrassingly parallel operation over the whole AMR 
grid. Systems involving Vi on the other hand contain two diagonal variable 
coefficient elliptic operators. Such systems could be approximately solved 
using a variety of methods such as block Jacobi, ILU variants, or multigrid 
on a per patch or per refinement level basis. However, methods that fail 
to account for inter-level couplings cannot eliminate any global error modes 
that span the refinement levels resulting in preconditioner performance that 
progressively degrades as the number of refinement levels is increased. In 
principle an algebraic multigrid or semistructured multigrid method could 
be used over the whole AMR grid hierarchy to address this. Such methods 
are attractive particularly on unstructured grids where defining geometric 
grid coarsenings is hard. However, in the structured AMR context such 
methods would require the formation of irregular stencil operators across 
refinement levels, a task that in practice is extremely programming intensive 
and error prone for finite volume discretizations with coarse-fine boundary 
interpolation. Furthermore, such methods in general ignore the multilevel 
structure of the AMR grid that already exists. The multilevel method we 
choose to invert the components of with is the Fast Adaptive Composite 
Grid (FAC) method [371 EH] of McCormick et. al. that extends techniques 
from multigrid on uniform grids to exploit the natural multilevel structure 
of SAMR grid hierarchies. 
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4-3. Fast Adaptive Composite Grid (FAC) method 

FAC solves problems on AMR grids by combining smoothing on refine- 
ment levels with a coarse grid solve using an approximate solver, such as 
a V-cycle of multigrid. In order to describe the algorithm we establish the 
following notation. 

• Ri : ^'^^i fl^ and Pi : — )■ fl^ respectively denote restriction 
and interpolation operators between adjacent refinement levels. Here 
we use bilinear interpolation for Pi and simple averaging for Ri. 

• RI : Vf^ ^ Vt^ and P^ : VL^ ^ Vt'^ respectively denote restriction 
and interpolation operators between the composite AMR grid VL^ and 
an individual refinement level In practice these are expressed in 
terms of the inter-level operators Ri and Pi. 

• represents a composite fine grid discrete operator discretized over 
the AMR grid hierarchy Vl'^ and represents for example one of the 
components of V\. approximates on level I. 

With this notation we can specify one V(m,n) sweep of the FAC Method 
as in Algorithm [TJ 

Algorithm 1: FAC 

Initialize: = /'^ — C^u'^] = R'^r'^ 
foreach fi^, £ = L, . . . , 2 

Smooth m times on: C^e^ = 

Correct : u'^ = u'^ + P^e^ 

Update : r'^ = — C^u'^ 

Set : = R^^__^r^ 
Solve : C^e^ = f 
Correct: u'^ = u'^ + P^e^ 
foreach fi^, £ = 2, . . . , L 

Smooth n times on: C^e^ = 

Correct : u'^ = u'^ + P^e^ 

Algorithm [l] makes clear the multiplicative nature of FAC: the residual 
is updated with the latest correction information before each smoothing 
pass can proceed. To be fully effective, each smoothing pass must properly 
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account for the data dependencies among different patches within a refine- 
ment level as well as coarse-fine data dependencies. In our calculations, we 
use red-black Gauss-Seidel smoothing on each refinement level; we also have 
the capability to use damped point Jacobi or block Gauss-Seidel smoothing. 
The correction steps require synchronization of the composite grid solution 
to make it consistent on all refinement levels. Note that the residual update 
can, in principle, be computed on only the most recently corrected refine- 
ment level plus a small border on the next coarser level, but we have found 
that residual evaluation is not expensive enough to justify this optimiza- 
tion. On the coarsest level, we can use one V-cycle of a multigrid method. 
However, our numerical results will simply use smoothing on the coarsest 
level. 



5. Dynamic Regridding 

Dynamic AMR requires changing the patch hierarchy as the simulation 
evolves in response to a changing error profile over the domain. This leads 
to refinement and derefinement of subdomains. Heuristic error indicators 
given by 



'^i,3,k — n 1 rE' I y^^^) 



and 



^x\{-^xx)i,:) 




+ hl\{Ezz)i,j,k\ 




O.lmaxij-fc \Ei, 




hx\{,Ex)i^ 


j,k \ ~l~ 1 (-^j/)i,j,fc 1 


+ hz\{Ez)ij^k\ 




0.1 maxj E'j 





n^o.k = n 1 J... I — I — (31) 



are used to identify cells with high curvature and gradient in the energy 
density based on a user specified threshold. Here and E^x denote the 
finite difference approximations to the gradient and Laplacian in the x di- 
rection with a similar notation being used for the other terms. Tagged cells 
are grouped into patches based on the Berger-Rigoutsos algorithm ^\ as 
implemented in the SAMRAI package [ID]. Regridding is critical in time 
dependent calculations to minimize the total computational cost as the sim- 
ulation evolves. However, it is an expensive operation and can introduce 
transient error into the simulation that leads to reduced timestep sizes and 
an overall increase in the number of timesteps required unless handled care- 
fully. The error indicators above are calculated at fixed intervals ( typically 
every 10 timesteps ) and a regrid operation is triggered only when necessary. 
Following the creation of a new grid hierarchy, interpolation of data from 

the old patch hierarchy to the new is required. The converged solution at 
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a timestep on the old grid hierarchy is in general not a solution on the 
new grid hierarchy after interpolation as is often evidenced by the jump in 
the nonlinear residual. High order interpolation could in principle minimize 
this effect but leads to non-physical negative undershoots in the interpo- 
lated values of the energy and temperature and is non-conservative. Hence, 
in our calculations we use conservative linear refinement to eliminate this 
possibility. Following solution interpolation onto the new AMR hierarchy 
time integration can be restarted. A cold restart approach to regridding 
uses the interpolated solution as the initial condition to restarting the sim- 
ulation starting with a single stage method such as backward Euler. A 
warm restart continues the simulation using interpolated time history vec- 
tors. We adopt a modified version of the latter approach by first performing 
a re-solve at the current timestep using the interpolated time history vectors 
as an initial guess. This however does not eliminate small discontinuities 
in the time derivatives and introduces non-stiff transient error components 
that are detected by the timestep controller leading to reduced time step 
sizes. Furthermore, since the timestep controllers rely on accurately esti- 
mating the local time error (23) which depends on quantities that can only 
be obtained by interpolation from the old grid hierarchy the estimate (23) 
is necessarily wrong for the first timestep and is not used. 



6. Numerical results 

Numerical results are presented for a carefully chosen model problem 
that is designed to test the performance of the implicit time integrator 
and its components on fairly complex 3D AMR hierarchies with challenging 
variations in the material properties. The domain is the unit cube with 
regions containing two materials with 

{[0.0625,0.2] X [0.375,0.625] x [0.375,0.625], 
[0.125,0.375] X [0.0,1.0] x [0.0,0.125], 
[0.125,0.375] X [0.0,1.0] x [0.875,1.0], 

^1, otherwise, 

as illustrated in Figure |4j At the initial time the energy density and material 

temperature are constant over the domain with Eq = 10~^, Tq = (Eq)*. 

On the left and right faces of the domain Robin boundary conditions ([t]) 

for the energy density are imposed with R = 1 and R = respectively. 

Zero Neumann conditions are imposed for the energy density on all other 
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faces. For the material temperature, zero Neumann boundary conditions are 
imposed on all boundaries. As the simulation progresses the left boundary 
of the domain is heated up and a steep Marshak wave [H] front for the 
energy density flows from the left to the right of the domain. AMR is 
used to resolve this wave front accurately with the implicit time integration 
schemes described enabling us to timestep efficiently and accurately at the 
dynamical time scale of the problem. As the wave front hits the regions with 
high z number fairly complex AMR grid configurations are generated due 
to the dependence of the diffusion coefficients. We locate these regions 
in our numerical experiment close to the left boundary to generate complex 
AMR configurations early on. 

All numerical results presented are based on simulations integrated to 
final time t = 1.0 with a variable step BDF2 implicit integrator except for 
the 256^ equivalent simulations where the final time t = 0.5 due to limits 
on the computational resources available to us. The user set tolerance 
in (18), (19) is set to 5.0 x 10~^ for grids with resolution equivalent to 
a 32^ uniform grid and decreased by a factor of two each time the grid 
resolution is doubled. All the numerical results reported use the PC.4.7 
timestep controller described earlier. At each timestep a JFNK method 
is used with the forcing term, rik set to according to the Eisentat- Walker 
algorithm as described in [3l]. The relative and absolute tolerances are 10~^^ 
and 10~^° respectively in the nonlinear solver. The Krylov method used is 
GMRES with the maximum Krylov space dimension set to 50 though this 
limit is never reached in practice. GMRES is right preconditioned with one 
iteration of the physics based preconditioner as described previously with 
single V(1,0) cycles of FAG being used to approximately invert the diffusion 
components for temperature and energy in the preconditioner. Red-black 
Gauss-Seidel is the smoother used on all refinement levels. All computations 
are performed in double precision on a Gray XK6 machine at ORNL. The 
infrastructure for SAMR was provided by the SAMRAI framework |40j . 
the JFNK solver is a slight modification of the PETSc |l2l iSl E] SNES 
implementation of the Eisenstat- Walker inexact Newton algorithm and the 
FAG solver was part of the SAMRSolvers package developed by the authors. 



Snapshots of the time evolution of an AMR mesh and solutions are 
presented in Figure [s} In the figures, the coarsest level is an uniform 16^ 
grid, with the finest level resolution equivalent to that of an uniform 128^ 
grid. As the figures show, the AMR grid dynamically tracks the movement 
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Figure 4: Material configuration. 



of the Marshak wave front, with the energy plots in particular showing the 
relatively slow heating of the material with z{x,y,z) = 10. 

6.1. Efficiency study 
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2.61 
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2563 


6.72 










2563 


3.04 











Table 1: Average linear iterations Table 2: Average nonlinear iterations 



Having described the numerical experiment setup we now study the effi- 
ciency of the implicit time integration scheme and the nonlinear and linear 
solvers on various AMR grids. Table [T] presents the average number of linear 
iterations required at each timestep as we vary the number of refinement 
levels when using FAC as a preconditioner. Along each row we have the 
number of refinement levels for the AMR grid and along each column the 
number of grid cells on the coarse base grid. Moving diagonally in Table [T] 
from the lower left to the upper right we see that for the same effective finest 
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Pseudocolor Pseudocolor 




t = 0.5 t = 1.0 

Figure 5: Evolution of AMR mesh, energy and temperature on a 16b41 grid on the domain 
[0, 1] X [0.52, 1] X [0, 1]. From top to bottom, the figures represent the AMR mesh, energy 
density, and temperature respectively. 



grid resolution the average number of linear iterations required to solve a 
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problem on a refined grid remains approximately constant. For example, on 
average 6.43 linear iterations are required per timestep to advance the so- 
lution on a uniform grid with 128^ grid cells and on average approximately 
the same number of iterations is required to advance the solution on adap- 
tively refined grids with 16^, 32^, 64'^, and 128'^ base grids with 4, 3, and 2 
levels of refinement respectively. This demonstrates the level independent 
performance of the multilevel physics based preconditioner without which 
we would have seen the average number of linear iterations increase simply 
as a result of moving from a uniform to an AMR grid with the same res- 
olution. Moving vertically down each column we see that increasing grid 
resolution marginally appears to decrease the number of linear iterations 
required. We infer from this that the timestep selection scheme is behaving 
correctly and that the FAC preconditioning is enabling the Krylov solver 
to deliver condition number independent convergence at each Newton step. 
For all our simulations a V(1,0) (slash) cycle of FAC was used as no sig- 
nificant improvement was seen by increasing the number of pre- or post- 
smoothing steps in FAC. In Table [2] we present the average number of non- 
linear iterations per timestep as the base grid resolution and number of 
refinement levels is varied. Here again we see little variation in the required 
number of nonlinear iterations for the same effective fine grid resolution. 



I6b4l 
32b31 

64b2l — e- 
I28bll 




1000 2000 3000 4000 5000 6000 7000 8000 9000 
Timestep # 



Figure 6: Time history of nonlinear iteration counts for different AMR grids with equiv- 
alent resolution 

25 




Figure 7: Time history of linear iteration counts for different AMR grids with equivalent 
resolution 

In figures Q and the per time step nonlinear and linear iteration 
counts for AMR grids with resolution equivalent to a 128^ uniform grid are 
shown. The key '16b41' in the figures refers to an AMR grid with a coarse 
base grid having 16^ grid cells and 4 refinement levels including the base 
grid. The other keys are to be interpreted similarly. As can be seen the 
performance of the nonlinear and linear solvers is fairly comparable for the 
different grids. Major differences in iteration counts are only present at the 
regrid events where spikes are seen in the plots. 

Table [3]presents the total number of timesteps taken for each simulation. 
Little or no variation is seen moving diagonally across the table from left 
to right. Figure [8] plots the per step time step variation over the course of 
the simulations on different equivalent resolution AMR grids. Other than 
at regrid events the figure shows close tracking of the timestep sizes for the 
various grids. The periodic rise and fall of the timestep size as the simulation 
progresses is characteristic of truncation error based timestep controllers. 
From this and the previous tables on linear and nonlinear iteration counts 
we conclude that for our application the cost per timestep does not rise when 
moving from uniform grids to AMR grids of equivalent fine resolution. 

Having shown the similar performance of the AMR calculations to uni- 
form grid calculations with respect to the convergence of the solvers at each 
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Figure 8: Timestep evolution for different AMR grids with equivalent resolution 
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Table 3: Total number of timesteps 



timestep and the total number of time steps we turn our attention to the 
potential performance gains with AMR. Figure |9] plots the relative degrees 
of freedom (DOFs) required for AMR grid simulations relative to a uniform 
2563 grid calculation over several regrid events. We see that introducing one 
level of refinement and decreasing the coarse grid resolution to 128^ reduces 
the number of required degrees of freedom significantly to less than 25% of 
a uniform grid calculation. As can be seen further reductions in the number 
of degrees of freedom are obtained as the coarse grid resolution is reduced 
and more AMR levels are introduced. We note that the reduced degrees of 
freedom required for an AMR calculation have a significant benefit on the 
memory requirements for Newton-Krylov methods as the memory required 
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for storing the Krylov subspace vectors is significantly reduced. Combined 
with strong multilevel preconditioners that reduce the size of the Krylov 
subspace required at each Newton step we significantly lower the memory 
requirements for our calculations by using AMR. 
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Figure 9: Relative degrees of freedom for AMR simulations (relative to uniform 256"^ 
simulations) 

The data presented on the number of linear and nonlinear iterations 
and the total number of timesteps taken indicates that solution efficiency 
in terms of convergence rates and minimizing the total number of timesteps 
required to integrate to a given time can be maintained by strong precondi- 
tioning and modifications to the timestep control strategy after regridding. 



Figure 10 presents the wall clock time required to integrate to time t = 1.0 
for the grids shown on 128 cores of an ORNL Cray XK6 machine. We note 
that while the code has been profiled and some optimizations done further 
code optimizations are actively being pursued. In particular, preliminary 
profiling of our application indicates that there is significant room to opti- 
mize the communication intensive preconditioning phases of the simulation. 
Furthermore, based on Figure |9] we can conclude that 128 cores are proba- 
bly not required to run the grids with refinement levels as opposed to the 
uniform 256^ grid calculation as on average less than 20% DOF's are re- 
quired to perform the AMR calculations. Nevertheless, keeping the number 
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Figure 10: Timings for 256bll equivalent AMR grids on 128 cores 



of cores fixed we show that the AMR calculations do indeed significantly 
reduce the wall clock times required. We note that a significant gain is 
obtained when introducing one and two levels of refinement while the im- 
provements are less marked after that point. This is consistent with the 
data in the previous table where we note that higher levels of refinement 
did not yield much efficiency. We believe these numbers can potentially be 
improved by introduced improvements to our refinement tagging strategies 
and by optimizing parallel performance. We hope to report on that work 
in future. 

6.2. Accuracy study 

The previous section focussed on the performance gains obtained with 
AMR. In this section we will document the temporal and spatial accuracy 
for our simulations on AMR grids. 

Temporal accuracy: In the absence of analytic solutions we first compute 
reference solutions at different temporal data points in the time interval 
[0,0.5] on a uniform 128bll grid with a fixed timestep of 2.5 x 10~^. While 
maintaining the same equivalent spatial fine grid resolution to keep the con- 
tribution from the spatial discretization error the same, simulations with 
fixed timesteps of At = 2 x 10~^, 1 x 10^^ and 5 x 10~^ are run and the 
solutions at the same data points are compared to the reference solution. 
Table |4] presents the L2 norm errors for the computed energy density and 
temperature on 16b41 AMR grids when compared against the reference so- 
lution. The errors decrease by a factor of at least 4 when the timestep is 
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t 


0.05 


0.15 
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At 


L2 Error: Energy Density 
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6.00e-06 


5.10e-06 


4.70e-06 
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4.50e-06 


5.0e-05 


1.20e-06 


l.OOe-06 


l.OOe-06 


9.00e-07 


9.00e-07 



Table 4: Temporal L2 norm errors on a 16b41 grid 



halved showing the second order accuracy of the BDF2 time integration 
scheme. While not shown, for a given timestep the differences between 
the solutions on uniform and AMR grids with the same equivalent finest 
grid resolution are also significantly lower than the spatial and temporal 
discretization errors. 

Spatial accuracy: For the spatial accuracy studies, as before, since an ex- 
act solution is not available, a simulation is done on a uniform 256^ grid 
and the time steps are recorded as well as the solution at different points 
in time. Solutions obtained using the same timesteps on AMR grids (to 
account for the effect of temporal discretization error) are then compared 
to the reference solutions on the uniform reference grid. Table [5] presents 
the L2 norm of the spatial discretization error for energy density and tem- 
perature respectively . Note that these runs were single material runs with 
the atomic number 2; = 1 in the whole domain. Second order accuracy is 
obtained though some loss of accuracy is observed in the initial stages of 
the calculation. For reference the errors on a uniform 128bll grid are also 
shown in the last row for the energy density and temperature respectively. 
Table [6] presents accuracy studies for simulations on the physical domain 
as shown in Figure |4] on both AMR and a uniform grid (128bll). The en- 
ergy and temperature errors show second order accuracy before time 0.25. 
However, after time 0.25 the front of the Marshak wave hits a discontinuity 
across material interfaces and the accuracy drops. While not shown this 
drop in accuracy is seen on uniform grids also in our tests and appears to 
be related to the spatial discretization of the nonlinear diffusion coefficients 
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L2 Error: Temperature 
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4.19e-03 



Table 5: Spatial L2 norm errors 



across discontinuous interfaces. 

7. Conclusions and Future Directions 

In this paper we described research on an efficient solution methodology 
for solving 3D non-equilibrium radiation diffusion problems. Implicit time 
integration enabled time stepping at the dynamical timescale of the prob- 
lem. Control theory based step size control minimized the overall required 
number of steps while allowing us to use methods that have computational 
stabihty. Inexact Newton methods as implemented in the JFNK solver min- 
imized the work required at the outer Newton iteration for each time step 
with GMRES providing a robust solver for non-symmetric definite systems 
at each Newton iteration. The multilevel preconditioner components were 
critical in provided level independent convergence of the linear solver. 3D 
AMR minimized the computational and memory requirements at each step 
of the calculation. The individual techniques described are not new, but 
their combination and application to solving problems in radiation trans- 
port is new to our knowledge. 
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2.09e-02 


1.90e-02 


2.65e-02 


3.39e-02 


16b41 


5.53e-03 


4.43e-03 


5.45e-03 


7.63e-03 


l.lle-02 


1.64e-02 


128bll 


5.53e-03 


4.43e-03 


5.45e-03 


7.63e-03 


l.lle-02 


1.64e-02 



Table 6: Spatial L2 norm errors 



Care had to be taken in selecting and combining the individual compo- 
nents so that the overall simulation was accurate and efficient. Our expe- 
rience in developing efficient simulation methods for this application and 
more broadly time dependent nonlinear multiphysics systems is that it re- 
quires not only focussing on how the individual simulation components can 
be efficient and/or accurate but also on understanding how the interplay be- 
tween the components can enhance or degrade the efficiency and accuracy of 
the overall simulation methodology. This is true for the interplay between 
the time step control algorithm and regridding for AMR, time step control 
and the accuracy of the nonlinear solvers, and controlling the efficiency of 
the Krylov solvers through level independent preconditioner components to 
mention a few. 

In future work we hope to report on the performance of this application 
on hybrid multicore-GPU petascale platforms such as the Titan supercom- 
puter at Oak Ridge National Laboratory. The non-equilibrium radiation 
diffusion application is an excellent testbed for studying the performance 
of nonlinear multi-physics application solution components on AMR grids 
at the petascale. Enabling greater asynchrony in our solver components by 
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using algorithms such as AFACx ^51 H6] , load balancing for AMR applica- 
tions, a posteriori error estimation for finite volume AMR applications, e.g. 
|T7] , and multiphysics smoother components tuned for hybrid architectures 
are some of the areas we hope to make progress on. 
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(a) Convex face 



(b) Convex edge 



(e) Convex point 





(d) Concave edge 



(e) Concave point 




(f) Sibling edge 



Sibling point 



Figure A. 11: Different types of coarse fine boundary fragments of the darker patch. 



37 



(a) 3 patches. 



(b) The coarse fine boundary 
fragments of the right face of the 
patch on the left. 




(c) Coarse fine boundary fragments of the top face of 
the patch at the bottom. 



(d) Coarse fine boundary frag- 
ments of the front face of the 
patch at the back. 



Figure A. 12: A complex example showing different coarse fine boundary fragments. 
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